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Abstract 

We study dynamical behavior of the Chinese stock markets by investigating the 
statistical properties of daily ensemble returns and varieties defined respectively 
as the mean and the standard deviation of the ensemble daily price returns of a 
portfolio of stocks traded in China's stock markets on a given day. The distribution 
of the daily ensemble returns has an exponential form in the center and power-law 
tails, while the variety distribution is log-Gaussian in the bulk followed by a power- 
law tail for large varieties. Based on detrended fluctuation analysis, R/S analysis 
and modified R/S analysis, we find evidence of long memory in the ensemble returns 
and strong evidence of long memory in the evolution of variety. 

Key words: Econophysics, Ensemble return, Variety, Probability distribution, 
Long memory, Statistical test 



1 Introduction 



Financial markets are complex systems, in which participants interact with 
each other and react to external news attempting to gain extra earnings by 
beating the markets. In the last decades, econophysics has become to flour- 
ish since the pioneering work of Mantegna and Stanley in 1995 [1]. Econo- 
physics is an emerging interdisciplinary field, where theories, concepts, and 
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tools borrowed from statistical mechanics, nonlinear sciences, mathematics, 
and complexity sciences are applied to understand the self-organized complex 
behaviors of financial markets [2,3,4]. Econophysicists have discovered or re- 
discovered numerous stylized facts of financial markets [2,5], such as fat tails 
of return distributions [6,1,7,8,9,10,11,12,13], absence of autocorrelations of 
returns [2], long memory in volatility [14,15,16], intermittency and multifrac- 
tality [7,17,18,19], and leverage effect [20,21], to list a few. 



Recently, Lillo and Mantegna have introduced the conception of ensemble vari- 
able treating a portfolio of stocks as a whole [22,23,24,25]. They have defined 
two quantities, the ensemble return and the variety. The ensemble return \x 
is the mean of the returns of the portfolio at time t, which is a measure of 
the market direction, while the variety a is the standard deviation of all the 
the returns at time t, which characterizes how different the behavior of stocks 
is. In the time periods when the markets are very volatile, the ensemble re- 
turns have larger fluctuations and the varieties are larger as well. It is very 
interesting to note that there are sharp peaks in the variety time series when 
the market crashes [24,25], which is reminiscent of the behavior of volatility. 
In addition, the daily ensemble return of stocks in the New York Stock Ex- 
change is found to be uncorrelated, while the daily variety has long memory 
[23]. Despite of such remarkable similarities shared by the ensemble returns 
and the returns and by the varieties and the volatilities, there are significant 
difference between these "competing" quantities, especially the shapes of the 
corresponding distributions. 



There are a huge number of studies in the literature showing that emerging 
stork markets behave differently other than the developed markets in many 
aspects. In most developed markets, the daily returns have well established 
fat tails, while the distributions of daily returns are exponential in several 
emerging markets, e.g., in China [26], Brazil [27], and India [28]. It is very 
interesting to investigate the statistical properties of the ensemble variables 
extracted in emerging stock markets, which is the main motivation of the 
current work. We shall focus on the Chinese stock markets in this paper. 



The paper is organized as follows. In Sec. 2, we explain the data set analyzed 
and define explicitly the ensemble return and variety. Section 3 presents analy- 
sis on the probability distributions of the daily ensemble returns and varieties. 
We discuss in Sec. 4 the temporal correlations of the two quantities, where we 
adopt R/S analysis and detrended fluctuation analysis (DFA) to estimate the 
Hurst indexes and perform statistical tests using Lo's modified R/S statistic. 
The last section concludes. 
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2 China's stock markets 



Before the foundation of People's Republic of China in 1949, the Shanghai 
Stock Exchange was the third largest worldwide, after the New York Stock Ex- 
change and the London Stock Exchange and its evolution over the period from 
1919 to 1949 had enormous influence on other world-class financial markets 
[29]. After 1949, China implemented policies of a socialist planned economy 
and the government controlled entirely all investment channels. This proved to 
be efficient in the early stage of the economy reconstruction, especially for the 
heavy industry. However, planned economic policies have unavoidably led to 
inefficient allocation of resources. In 1981, the central government began to is- 
sue treasury bonds to raise capital to cover its financial deficit, which reopened 
the China's securities markets. After that, local governments and enterprises 
were permitted to issue bonds. In 1984, 11 state-owned enterprises became 
share-holding corporations and started to provide public offering of stocks. 
The establishment of secondary markets for securities occurred in 1986 when 
over-the-counter markets were set up to trade corporation bonds and shares. 
The first market for government-approved securities was founded in Shanghai 
on November 26, 1990 and started operating on December 19 of the same 
year under the name of the Shanghai Stock Exchange (SHSE). Shortly after, 
the Shenzhen Stock Exchange (SZSE) was established on December 1, 1990 
and started its operations on July 3, 1991. The historical high happened in 
2000 when the total market capitalization reached 4,968 billion yuan (55.5% 
of GDP) with 1,535.4 billion yuan of float market capitalization (17.2% of 
GDP). The size of the Chinese stock market has increased remarkably. 

The data set we used in this paper contains daily records of n — 500 stocks 
traded in the SHSE and the SZSE in the period from February 1994 to Septem- 
ber 2004. The total number of data points exceeds one million. For each stock 
price time series, we calculate the daily log-return as follows 

r t (t) = \n[P t (t)/P l (t-l)] , (1) 

where -Pj(t) is the close price of stock i on day t. The ensemble return fi(t) is 
then defined by 

1 n 

M*) = -E^K ( 2 ) 

while the variety a(t) is defined according to 

1 n 

a 2 {t) = -Y,Ut)-m 2 . (3) 

The number of active stocks may vary along time t. When a stock j is not 
traded at time t, it is not included in the calculation of fi and cr. 
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Figure 1 illustrates the daily ensemble returns /x and the daily variety a in 
the Chinese stock market as a function of time t from Feb. 1994 to Sep. 2004. 
An striking feature is observed in both quantities that the amplitude of the 
envelop decreases along time, which indicates that the Chinese stock markets 
are becoming less volatile and more efficient. 
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Fig. 1. Evolution of the daily ensemble return \i (top) and the daily variety a 
(bottom) as a function of time. 



3 Probability distributions of ensemble variables 



The central parts of ensemble returns of NYSE stocks and Nasdaq stocks 
are exponentially distributed and the negative part decays slower than the 
positive part [23,25], while the tails look like outliers in the sense that those 
ensemble returns are extremely large and can not be modeled by the same 
exponential distribution as the center part [4]. The Chinese stock markets 
have the same behavior qualitatively. Figure 2 shows the empirical probability 
density function of \i. We find that the main part of the density function has 
the following form 

/(//) ~ exp[-fc±/x] , (4) 

where fc_ = 76.1 ± 3.3 when —0.06 < fi < and k + = 83.2 ± 3.5 when 
< /i < 0.06, which shows that the distribution is asymmetric with the 
skewness being 0.378. It is interesting to note that the Chinese stock markets 
have more large ensemble returns than the USA markets, which is consistent 
with the fact that the Chinese stock markets are extraordinarily volatile. 

In order to exploit the tail distribution of the ensemble returns, we adopt 
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Fig. 2. Empirical probability density function of [i. The circles represent the real 
data. The sold lines are the least squares fits to the exponential (4) in the range 
[—0.06,0] and [0,0.06] and the slopes are 76.1 ± 3.3 and 83.2 ± 3.5, respectively. 

the rank-ordering approach [9,30]. We first sort the n observations in non- 
increasing order, that is ji\ > /i 2 > • ■ ■ > jiR > ■ ■ ■ > /i n , where R is the rank 
of the observations. Let C(/i) = /(/i)d/i, then we have 

nC(fi R ) = R (5) 

When the probability density of the ensemble variable /i scales as f(fi) ~ 
^-(i+a) in the tail, we have [9,30] 

~ R~ l/a (6) 



for R <^ N. A rank-ordering plot of In fiR against In R thus allows us to check 
if the tails have power law form. 

Figure 3 shows the rank-ordering plot of both positive and negative tails, which 
are approximately power laws. The fitted tail exponents are o;_ = 3.33 ± 0.06 
for the negative fi and a + = 2.86 ± 0.07 for positive /i. This is reminiscent of 
the inverse cubic law of returns [10,31,32]. 

In Fig. 4 is shown the distribution of varieties of the Chinese stock markets. 
It is evident that the main part of the distribution follows a log-normal form 
followed by a well established power law tail: 
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Fig. 3. Log-log plot of the rank-ordering In [i as a function of InR for both positive 
and negative tails (shown in the legend). The plot of the negative tail is translated 
vertically for clarity. The solid lines are the least squares fit to the data at the 
interval < lni? < 4 for the negative tail and < lni? < 3.5 for the positive tail, 
and the slopes of the two lines are 0.3 ± 0.005 and 0.35 ± 0.009, respectively. 



where the tail exponent is found to be (3 = 5.3 ± 0.2. Again, the shape of 
the variety distribution in the Chinese stock markets is qualitatively the same 
as in the USA stock markets [23]. Note that the volatilities of most stock 
markets have log-normal distributions with power-law tails [16]. However, the 
tail distribution of varieties in China's stock markets deviates from the inverse 
cubic law. 
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Fig. 4. Left: Empirical probability density function of the variety a in double log- 
arithmic coordinates. The continuous line is a parabolic fit while the dashed line 
shows a power-law distribution of larger varieties (In a > —3.5). The tail exponent 
is (3 = 5.3 ± 0.2. Right: Log-normal distribution of a. 
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4 Long memory in the ensemble variables 

4-1 Detrended fluctuation analysis 

There are a lot of methods developed to extract temporal correlation in time 
series, among which the detrended fluctuation analysis (DFA) is the most 
popular method due to its easy implementation and robust estimation even 
for short time series [33,34,35]. DFA was invented originally to study the long- 
range dependence in coding and noncoding DNA nucleotides sequence [36] 
and then applied to various fields including finance. In order to investigate 
the dependence nature of ensemble variables in China's stock markets, we 
first adopt the detrended fluctuation analysis. 

The DFA is carried out as follows. Consider a time series x(t), t = 1, 2, • • ■ , N. 
We first construct the cumulative sum 

u(t)=J>(z), t = l,2,.-.,jV. (8) 

j=i 

The time interval is then divided into disjoint subintervals of a same length s 
and fit u(t) in each subinterval with a polynomial function, which gives u s (t), 
representing the trend in the subintervals. The detrended fluctuation function 
F(s) is then calculated 

1 N 

F 2 (s) = -rz - u,(i)] 2 . (9) 

JV i=l 

Varying s, one is able to determine the scaling relation between the detrended 
fluctuation function F(s) and time scale s. It is shown that 

F(s) ~ s H , (10) 

where H is the Hurst index [33,37], which is shown to be related to the power 
spectrum exponent 77 by 77 = 2H — 1 [38,39] and thus to the autocorrelation 
exponent 7 by 7 = 2 — 2H . 

Figure 5 plots the detrended fluctuation functions F(s) of the ensemble daily 
variables \x and a as a function of time scale s. There are two scaling laws in 
the curve for /1, which are separated at the crossover time lag In s x = 3.6. The 
Hurst indices for both scaling ranges are Hi = 0.66±0.02 and H 2 = 0.87±0.02, 
respectively. For variety a, a sound power law scaling relation is observed 
with a Hurst index H 3 = 0.93 ± 0.01. This strong correlation observed here is 
consistent with that in the USA markets, where the autocorrelation exponent 
is reported to be 7 = 0.230 ± 0.006 [23]. 
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Fig. 5. Log-log plot is of the detrended fluctuation functions F(s) of the ensemble 
daily variables fi and a with respect to the time scale s. The squares stand for the 
results calculated from real data of fi and the circles represent the real data of a. 
The plot for u is translated vertically for clarity. 



4-2 Reseated range analysis 



To further investigate the correlation structure in the ensemble returns and 
varieties, we adopt the well-known R/S analysis. R/S analysis was invented 
by Hurst [40] and then developed by Mandelbrot and Wallis [41,42], known 
also as Hurst analysis or rescaled range analysis. 

Assume that time series {yi : i = 1,2, ••• ,s} is a sub-series taken from a 
longer time series {xi : % — 1, 2, • • • , N} successively. The cumulative deviation 
of {yi} is denned by 

= E G/j - u) > ( n ) 

3=1 

where y is the sample average of {y%}, and the range is given by 

R s = max X s j — min X s i (12) 

For a time series with long memory, the range rescaled by the sample standard 
deviation 

1/2 

(13) 



n i=i 
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scales as a power law with respect to the time scale s 



R(s) „ 



S(s) 



(14) 



when s — > oo. There are different algorithms to implement the R/S analysis, 
based on the partition of sub-series {yi}. We adopt an algorithm based on 
random choices of sub-series of size s and averaging over them [43] . 

The results of the R/S analysis on the daily ensemble returns and varieties 
are presented in Fig. 6. We observe that both variables exhibit two scaling 
ranges. For the ensemble returns, the crossover occurs at lns x = 4.7, which 
should be compared with the crossover at lns x = 3.6 in Fig. 5. The Hurst 
index for small s is Hi = 0.65 ± 0.004, which is very close to Hi = 0.66 ± 0.02 
in Fig. 5. For larger s, we have H 2 = 0.54 ± 0.01, which is much smaller 
than H 2 = 0.87 ± 0.02 in the detrended fluctuation analysis. This calls for 
further investigate of possible long memory in the daily ensemble returns. For 
the daily, varieties, the crossover takes place at lns x = 2.8. The Hurst index 
for s ^ s x is H 3 = 0.77 ± 0.01, while for s ^ s x we get H A = 0.91 ± 0.003, 
which is consistent with H 3 = 0.93 ±0.01 in the detrended fluctuation analysis 
illustrated in Fig. 5. 



^4 
fl 3 



□ \l 






- o a 










^r^« 2 = 0.54 


H i = 


0.65 








jg^ll A = 0.91 


H 3 = 


0.77 jS* 











1 2 3 4 5 6 7 

In s 

Fig. 6. R/S analysis of the daily ensemble returns and the daily varieties. The 
squares stand for the results calculated from real data of fj, and the circles represent 
the real data of a. The two lines are the least squares fit to their results respectively. 
The plot for [i is translated vertically for clarity. 
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4-3 Statistical tests of long memory 



The information extracted from the DFA and the R/S analysis performed on 
the variety is consistent, where both methods give a large value of Hurst index. 
However, the situation is quite different when the ensemble is concerned. The 
Hurst indexes for large time scale s obtained from the two methods are both 
not far away from H = 0.5. Due to the subtlety of the issue of long memory, 
we provide further statistical tests for both ensemble variables, adopting Lo's 
modified R/S statistic [44]. 

Consider a stationary time series of size n. The modified R/S statistic is given 
by [44] 

Q n = R n /S n (g) , (15) 

where R n is the range of cumulative deviations defined in Eq. (12) and S n (q) 
is defined by 

S 2 n (q) = S 2 n + 2 £ Uj (q) Pj = S 2 n + 2 £ ( 1 - p 3 , (16) 

where S n is the standard deviation defined in Eq. (13) and pj = - J2?=j+i(Vi ~ 
y)(yi-j — y) is the autocovariance of the time series. If the time series has 
short-term memory, the statistic variable 

V n (q) = -±=Q n {q) (17) 

>n 



has a finite positive value whose cumulative distribution reads 

oo 

P(V) = l + 2£(l-4&V 2 )e fc2v/2 

k=l 



The fractiles can be estimated from Eq. (18): for a bilateral test of 5% signif- 
icance, we have V0.025 = 0.809 and V0.975 = 1.862. When the time series has 
long-term memory, it is proved that R n trends to the Brownian bridge variable 
Vh, while the variable S 2 /(nS n (q)) tends to or 00 for large q, that is 



V n (q) = —^Qniq) 
in 



0, He (0,0.5) 

1 ; (19) 

00, He (0.5,1) 



These properties allow us to distinguish short memory from long memory. The 
null hypothesis and its alternative hypothesis may be expressed by 

H : The time series under consideration has short memory; 
H\. The time series under consideration has long memory. 
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The test is performed at the a significant level to accept or reject the null 
hypothesis according to whether V n (q) is contained in the interval [V a , Vi- a ] 
or not, where F(V a ) = a/2 and F{V^ a ) = 1 - a/2. When V n {q) £ [V a , V^_ a ], 
the null hypothesis H can be rejected such that the time series has long 
memory. 

We have used q = 36, 72, 108, and 144 in the tests. The tests are performed 
on the whole time series from 1994/02/14 to 2004/09/15 and its subintervals. 
The results for [i are presented in Table 1. For the whole time series, the 
hypothesis that there is no long memory can not be rejected. However, the 
alternative long memory in several subintervals is significant at the a = 5% 
level. It is thus possible there exists long memory in fi in the Chinese stock 
markets intermittently, which is not unreasonable due to the inefficiency of 
the emerging markets. 
Table 1 

Statistical test of long memory in the daily ensemble return [i using the modified 
R/S statistic V n (q) = Q n (q)/Vn, which is compared with the classical R/S statistic 
Vn = Qn/Vn, where Q n = R/S. 



Time Period 


n 


V n 


K(36) 


V n {72) 


K(108) 


K(144) 


1994/02/14 - 


2004/09/15 


2568 


1.81 


1.66 


1.70 


1.70 


1.72 


1994/02/14 - 


1999/05/18 


1284 


2.11* 


2.03* 


2.17* 


2.19* 


2.16* 


1999/05/19 - 


2004/09/15 


1284 


1.93* 


1.55 


1.47 


1.46 


1.50 


1994/02/14 - 


1996/09/19 


642 


2.38* 


2.35* 


2.55* 


2.66* 


2.88* 


1996/09/20 - 


1999/05/18 


642 


2.00* 


1.88* 


2.00* 


2.09* 


1.96* 


1999/05/19 - 


2002/01/14 


642 


2.45* 


1.85 


1.73 


1.71 


1.73 


2002/01/15 - 


2004/09/15 


642 


2.89* 


2.81* 


2.77* 


3.01* 


3.47* 



Table 2 presents the tests for the daily varieties a. The long memory hypoth- 
esis is significant at the a = 5% level for all values of q in all subintervals 
investigated. For the whole time series, the null hypothesis H is rejected for 
q = 36 and q = 72. For larger values of q, the tests show that there is no 
significant long memory. Since the definition of the statistic V q (n) amounts to 
remove "autocorrelation" up to q trading days, the modified R/S test is biased 
to over-reject long memory [45]. Therefore, we argue that the daily varieties 
a are long-term correlated. 



5 Conclusion 

The ensemble variables /i and a are important for studying the behavior of 
financial markets as a whole complex system, instead of individual stocks. In 
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Table 2 

Statistical test of long memory in the daily ensemble return using a the modified 
R/S statistic V n (q), which is compared with the classical R/S statistic V n . 



Time Period 


n 


V n 


K(36) 


V n (72) 


K(108) 


K(144) 


1994/09/14 - 


9004/09 /1 5 


2568 


11.18* 


2.67* 


2.07* 


1.81 


1.64 


1994/02/14 - 


1999/05/18 


1284 


19.17* 


5.43* 


4.50* 


4.12* 


3.85* 


1999/05/19 - 


2004/09/15 


1284 


20.11* 


4.43* 


3.48* 


3.08* 


2.84* 


1994/02/14 - 


1996/09/19 


642 


19.61* 


5.70* 


4.82* 


4.59* 


4.49* 


1996/09/20 - 


1999/05/18 


642 


30.15* 


8.77* 


7.66* 


7.27* 


6.94* 


1999/05/19 - 


2002/01/14 


642 


27.09* 


6.09* 


4.99* 


4.67* 


4.49* 


2002/01/15 - 


2004/09/15 


642 


42.99* 


11.52* 


9.49* 


8.61* 


8.14* 



this paper, we have studied the statistical properties of the daily ensemble re- 
turns and daily varieties of 500 stocks traded in the Shanghai Stock Exchanges 
and the Shenzhen Stock Exchanges from 1994/02/14 to 2004/09/15. 

The daily ensemble returns /i are found to have exponential distributions 
followed by power-law tails. The negative ensemble returns decay more slowly 
than the positive part. The negative and positive tail exponents are a_ = 
3.33 ± 0.06 and a + = 2.86 ± 0.07. On the other hand, the daily varieties a 
exhibit a log-normal distribution for not large values and a power-law form on 
the tail for large values. The tail exponent is estimated to be j3 = 5.3 ± 0.2. 

There are numerous controversies on the efficiency of the Chinese stock mar- 
kets, with slight bias to inefficiency [29]. Using detrended fluctuation analy- 
sis, R/S analysis and modified R/S analysis, we have shown that the daily 
ensemble returns have long-term memory in several time periods, which is 
nevertheless insignificant in the whole time series. Specifically, the long mem- 
ory disappears only in the time period from 1999/05/19 to 2002/01/14. This 
indicates that the Chinese stock markets do not follow random walks in most 
time periods. The long-term memory in the daily varieties is quite strong with 
a large hurst index H = 0.91 ~ 0.93. 
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